wip improved object retrieval #10513

logan-markewich · 2024-02-08T01:49:28Z

A WIP approach to improve recurisve retrieval.

Basically, if index_node.obj is serializable, then just throw it into the vector db/storage layer, no need for mappings

Example Usage

from llama_index import VectorStoreIndex, StorageContext
from llama_index.schema import IndexNode, TextNode
from llama_index.vector_stores import QdrantVectorStore

index = VectorStoreIndex(nodes=[TextNode(text="bad_node")])

# this isn't serializable, and is maintained as a mapping
bad_node = IndexNode(index_id="bad1", obj=index.as_retriever(), text="bad summary")

# this is serializable, no mapping needed
good_node = IndexNode(index_id="good1", obj=TextNode(text="good_node"), text="good_summary")

index2 = VectorStoreIndex(
        nodes=[good_node],
        objects=[bad_node],
)
nodes = index2.as_retriever(verbose=True).retrieve("test")

print(nodes[0].text)  # -> "good node"
print(nodes[1].text)  # -> "bad node"

# save
index.storage_context.persist()

# loading -- need to provide unserializable objects at load time
load_index_from_storage(storage_context, objects=[bad_node])

TODO

unit tests
confirm it works with vector dbs

llama_index/schema.py

llama_index/indices/base.py

llama_index/core/base_retriever.py

jerryjliu · 2024-02-08T05:25:26Z

llama_index/core/base_retriever.py

@@ -144,7 +167,7 @@ def _handle_recursive_retrieval(
            node = n.node
            score = n.score or 1.0
            if isinstance(node, IndexNode):
-                obj = self.object_map.get(node.index_id, None)
+                obj = node.obj or self.object_map.get(node.index_id, None)


help me understand, is object_map now only used for retrievers/query engines?

If it's a Node it should now be serialized/deserialized directly on the IndexNode right?

at a high-level once we make retrievers/query engine serializable i was thinking object_map would go away, and we'd replace with a proper docstore

Yes if query engines/retrievers were serializable, this would go away.

Right now, unserializable index nodes have to be passed in under the objects kwarg -- from there, we can build a map of index id to object

Then we can serialize and retrieve the index node without the object.

If an index node is retrieved, the object map is checked if we have its object

hatianzhang · 2024-02-16T23:45:53Z

nice will upd some notebooks!

wip

b9d8c39

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Feb 8, 2024

hatianzhang reviewed Feb 8, 2024

View reviewed changes

llama_index/schema.py Outdated Show resolved Hide resolved

llama_index/schema.py Outdated Show resolved Hide resolved

llama_index/indices/base.py Outdated Show resolved Hide resolved

jerryjliu reviewed Feb 8, 2024

View reviewed changes

logan-markewich added 5 commits February 11, 2024 19:02

Merge branch 'main' into logan/serialize_recursive_retriever

0169957

address comments

8c3e773

Merge branch 'main' into logan/serialize_recursive_retriever

0295ed1

remove pdb

7e327e6

fix small errors

aeb8e1e

logan-markewich merged commit 3546490 into main Feb 16, 2024
8 checks passed

logan-markewich deleted the logan/serialize_recursive_retriever branch February 16, 2024 23:16

fix litellm tests

affd5b2

Dominastorm pushed a commit to uptrain-ai/llama_index that referenced this pull request Feb 28, 2024

wip improved object retrieval (run-llama#10513)

ef081ef

anoopshrma pushed a commit to anoopshrma/llama_index that referenced this pull request Mar 2, 2024

wip improved object retrieval (run-llama#10513)

b8dd271

Izukimat pushed a commit to Izukimat/llama_index that referenced this pull request Mar 29, 2024

wip improved object retrieval (run-llama#10513)

304434e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip improved object retrieval #10513

wip improved object retrieval #10513

logan-markewich commented Feb 8, 2024 •

edited

Loading

jerryjliu Feb 8, 2024

logan-markewich Feb 8, 2024

hatianzhang commented Feb 16, 2024

wip improved object retrieval #10513

wip improved object retrieval #10513

Conversation

logan-markewich commented Feb 8, 2024 • edited Loading

Example Usage

TODO

jerryjliu Feb 8, 2024

Choose a reason for hiding this comment

logan-markewich Feb 8, 2024

Choose a reason for hiding this comment

hatianzhang commented Feb 16, 2024

logan-markewich commented Feb 8, 2024 •

edited

Loading